svm: allow conflicting transactions in entries#3453
svm: allow conflicting transactions in entries#34532501babe merged 6 commits intoanza-xyz:masterfrom
Conversation
e5f77c3 to
79e6c3c
Compare
f9d86a9 to
1bf6a38
Compare
| if let Some(program) = (use_program_cache && is_invisible_read) | ||
| .then_some(()) | ||
| .and_then(|_| self.program_cache.find(account_key)) | ||
| { | ||
| // Optimization to skip loading of accounts which are only used as | ||
| // programs in top-level instructions and not passed as instruction accounts. | ||
| Some(LoadedTransactionAccount { | ||
| return Some(LoadedTransactionAccount { | ||
| loaded_size: program.account_size, | ||
| account: account_shared_data_from_program(account_key, &self.program_accounts) | ||
| .ok()?, | ||
| rent_collected: 0, | ||
| }) | ||
| }); | ||
| } |
There was a problem hiding this comment.
i made this a separate block because once disable_account_loader_special_case activates it is simply deleted. we will also be able to remove program_cache and program_accounts from AccountLoader and they become free-floating variables in transaction_processor.rs again
There was a problem hiding this comment.
also an idea i have been toying with after we do that is to simply implement TransactionProcessingCallback for AccountLoader itself so we can pass it to program cache setup, which will happen after account loading. this depends on how feasible it is to restrict callbacks to a concrete type (unless we just make two traits)
| // If lamports is 0, a previous transaction deallocated this account. | ||
| // Return None without inspecting, so it can be recreated. | ||
| if cache_item.account.lamports() == 0 { | ||
| None | ||
| } else { | ||
| Some(cache_item.account.clone()) | ||
| } |
There was a problem hiding this comment.
while i was getting this pr ready, i realized we dont need to replace zero lamport accounts in the account cache with AccountSharedData::default() at all, so that saves us some Arc::new() calls. we just need something in there that stops us from going to accounts-db, and we can replace it when reading. account_cache is private and load_account() is the only function that reads from it, so we dont need to handle them special anywhere else
| self.account_cache.insert( | ||
| *account_key, | ||
| AccountCacheItem { | ||
| account: account.clone(), | ||
| inspected_as_writable: is_writable, | ||
| }, | ||
| ); |
There was a problem hiding this comment.
i left the cache insert unguarded by disable_account_loader_special_case which means it will be used by 2.2 the moment this pr lands. this is safe (@jstarry should confirm):
- there is no change to pre-simd83 consensus if
disable_account_loader_special_caseis not active, because the account cache comes after the program cache, so behavior is identical to if these accounts came from accounts-db disable_account_loader_special_caseis in 2.0, so it is almost guaranteed to activate before the simd83 consensus gate which will relax account locks. but if, for some startling reason, simd83 consensus activates first, it is still fine. there is an edge case where the program cache would shadow the account cache if you close and reopen a loaderv3 buffer. but this only applies if the account is loaded invisible, so it only affects transaction data size, and this is fine because simd83 consensus necessarily can only be activated by its own feature gate
| if let Some(slot_history) = | ||
| account_overrides.and_then(|overrides| overrides.get(&slot_history::id())) | ||
| { | ||
| // We set `inspected_as_writable` to `true` because this never needs to be inspected. | ||
| account_cache.insert( | ||
| slot_history::id(), | ||
| AccountCacheItem { | ||
| account: slot_history.clone(), | ||
| inspected_as_writable: true, | ||
| }, | ||
| ); | ||
| } |
There was a problem hiding this comment.
after this pr lands, we can contemplate a more restricted interface for passing only this account through. maybe slot_history: Option<AccountSharedData> on the transaction processing environment
bit of an aside, but the sol_get_sysvar syscall we added in 2.0 for core bpf program conversion does not support SlotHistory by design, we only needed StakeHistory and SlotHashes. so the good news is it doesnt introduce a mechanism to bypass this override
the possibly better news is... if neither me nor joe needed to add SlotHistory, do any important bpf programs or builtins actually use it? i have no idea. maybe we dont need to let programs see this sysvar at all if its only used by the validator itself. if so, we could add some #[deprecated] where needed and specialcase it like the instructions sysvar in load_transaction_account to return an account with no data (would require a feature gate, perhaps we could avoid a simd)
| account_data.set_rent_epoch(u64::MAX); | ||
| account_data.set_lamports(1); |
There was a problem hiding this comment.
the test account is read-only, but this is now necessary because svm would put it in the account cache the first time it loads it in an entry and then assume it had been deallocated the second time it loads it. the svm behavior is correct, because there should be no way to bring an account to zero lamports without it being writable, therefore we should never be able to see one in a read-only context in the first place. our builtins and sysvars have dust lamports
| slot_history::id(), | ||
| AccountCacheItem { | ||
| account: slot_history.clone(), | ||
| inspected_as_writable: true, |
There was a problem hiding this comment.
I thought there was a discussion on previous PR about this inspected_as_writable stuff.
The function that is called to inspect accounts already has guards against doing it twice.
Is this still necessary?
There was a problem hiding this comment.
we never got confirmation on that. ill check with brooks
There was a problem hiding this comment.
alright i have clarity on this now, the thing i was missing is that the idea svm has of "what things to inspect" and the idea bank has of "what results of inspection do i actually care about" are disjoint concepts. we can get rid of the state tracking and inspect every time we get an account from accounts-db or account cache. bank has its own dedupe logic and other implementers of inspect can do whatever they want
1bf6a38 to
96ef843
Compare
|
removed the we can get rid of |
| // Determine a capacity for the internal account cache. This | ||
| // over-allocates but avoids ever reallocating, and spares us from | ||
| // deduplicating the account keys lists. | ||
| let account_keys_in_batch = sanitized_txs.iter().map(|tx| tx.account_keys().len()).sum(); |
There was a problem hiding this comment.
To avoid allocating every time, we can change to use thread local variable instead, wdyt.
example:
thread_local! {
static HAS_DUPLICATES_SET: RefCell<AHashSet<Pubkey>> = RefCell::new(AHashSet::with_capacity(MAX_TX_ACCOUNT_LOCKS));
}
There was a problem hiding this comment.
this code is just getting a reasonable upper bound on account cache size. the mechanism in your pr to prevent duplicate transactions in svm isnt required because svm has no concept of duplicate transactions. my understanding is theyre presently implicitly deduped at the account locks level by relying on the fee-payer lock and we just need to do this explicitly with message hashes instead
There was a problem hiding this comment.
I am not suggesting here to de-duplicate transactions within a batch. The purpose of this example is to point out a thread_local variable use case to avoid the allocation over-head for account cache in every batch.
Regarding the de-duplication of transactions, when conflicting batches will be allowed the account locking function won't implicitly de-duplicate such transactions. Then as you said we need to explicitly check a batch does not have any duplicate transactions using message hashes.
There was a problem hiding this comment.
oh, ty for the suggestion but we dont want to reuse the cache between batches
|
Just for clarification, according to the previous conversation the second PR containing the account cache was envisioned to be feature-gated. |
thats correct, the decision to feature gate svm was made because the necessary logic to first go to the account cache and then possibly fall back to the program cache was extremely brittle and had an enormous number of edge cases. but for unrelated reasons we decided to remove (via feature gate) the program cache from account loading entirely and backported it to 2.0, so we were able to simplify the logic here to be much more sound |
Sounds good, so if I understand correctly, the |
| if let Some(slot_history) = | ||
| account_overrides.and_then(|overrides| overrides.get(&slot_history::id())) | ||
| { | ||
| account_cache.insert(slot_history::id(), slot_history.clone()); |
There was a problem hiding this comment.
Should we add a debug assert that the size of account_overrides is equal to 1 if sysvar is present and 0 otherwise? Or is it basically impossible now that we decreased the vis of the account overrides set_account function?
There was a problem hiding this comment.
it isnt possible, the only exposed setter is set_slot_history now (which accepts AccountSharedData and hardcodes the Pubkey as sysvar::slot_history::id(). we can remove AccountOverrides entirely now as a cleanup. im thinking ill look into it when i start on simd186 since id like to change something about simulation results, and this is related to simulation only
| self.callbacks | ||
| .inspect_account(account_key, AccountState::Alive(account), is_writable); |
There was a problem hiding this comment.
Shouldn't we check if lamports is 0 before saying the account is alive? Otherwise you could read a dead account into the cache and later write to it and even tho it's still dead, we say it's alive.
There was a problem hiding this comment.
youre right, ill open a pr for this on monday. thankfully it doesnt affect bank because to get here we would have already had to inspect as writable (which is why this was missed; before removing AccountCacheItem, inspecting a previously-alive now-dead account never happened at all)
edit: actually there is a slight edge case of taking a dead account read-only, leaving it dead, and making it alive, but this is only possible post-simd83 and the fix is the same
edit2: nevermind there actually is no way to get a bad first inspect because to bring the account to zero lamports in the first place it must necessarily be inspected as writable
There was a problem hiding this comment.
Hmm isn't there a bug post simd-83 where you read a dead account in one tx and then write to it in another? The inspect for the write tx would say it's alive when it's actually dead
There was a problem hiding this comment.
yes, i made my first edit thinking of that, but made the second edit thinking of the reason why we didnt need to check for equality with AccountSharedData::default(). but in this case the account doesnt need to exist
(i have a pr for this now but have been blocked by ci build issues, will add you once its green)
* Generalize naming of vote rewards to reward commissions * Address review feedback: update remaining vote rewards references (#3453)
Problem
simd83 intends to relax entry-level constraints, namely that a transaction which takes a write lock on an account cannot be batched with any other transaction which takes a read or write lock on it
before the locking code can be changed, however, svm must be able to support such batches. presently, if one transaction were to write to an account, and a subsequent transaction tried to read from or write to that account, svm would not pass the updated account state to the second transaction; it gets them from accounts-db, which is not updated until after all transaction execution finishes
Summary of Changes
in #3404, we refactored transaction processing to perform all steps for each transaction in sequence, and added the
AccountLoadertype to enable the future addition of an intermediate account cache for changed states. this pr adds that cacheall code changes are in the first commit, and the commit is delightfully small
im still minorly displeased at how much code we have to copy from account saver
collect_accounts_to_storebecause those functions also need the pre-execution transactions, the post-execution transactions, and accumulate multiple vectors of data. maybe we could make the logic inupdate_accounts_for_executed_txet al runtime utility functions and have account saver do two passes through everything, but i think this kind of refactor is out of scope hereall other commits are changes to tests. the vast majority of new lines of code in this pr are new integration tests, but they are basically unchanged from #3146. the only important change to the integration tests are making the framework emulate how zero lamport accounts are deallocated. we dont reuse the code on
AccountLoaderto guard against refactors introducing bugs in its own logic